OCPBUGS-77827: fix(api): add missing has() guards to servingCerts CEL validation rule#8331
OCPBUGS-77827: fix(api): add missing has() guards to servingCerts CEL validation rule#8331rutvik23 wants to merge 2 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
@rutvik23: This pull request references Jira Issue OCPBUGS-77827, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (17)
📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA kubebuilder Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 inconclusive)
✅ Passed checks (9 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@rutvik23: This pull request references Jira Issue OCPBUGS-77827, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
api/hypershift/v1beta1/hostedcluster_types.go (1)
527-527: Guards correctly fix the reported "no such key" failures; consider hoisting the configurationhas()checks out ofservices.exists()for readability and CEL cost.The added
has()guards address the issue correctly: thanks to&&short-circuit evaluation,self.configuration.apiServer.servingCerts.namedCertificatesandcert.namesare only dereferenced after their existence is confirmed. This handles both failure modes: "no such key: servingCerts" on upgrade-from-older-CRD, and "no such key: namedCertificates" on create.One optional refinement: the four
has(self.configuration...)checks do not depend onsand are currently re-evaluated for every services entry inside theexistspredicate. Hoisting them out keeps the inner predicate focused on the per-service check, is easier to read, and reduces CEL cost (relevant given the existing cost-budget caveat noted on Line 669).♻️ Proposed refactor (equivalent behavior, cheaper and more readable)
-// +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && has(self.configuration) && has(self.configuration.apiServer) && has(self.configuration.apiServer.servingCerts) && has(self.configuration.apiServer.servingCerts.namedCertificates) && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]" +// +kubebuilder:validation:XValidation:rule=`!has(self.configuration) || !has(self.configuration.apiServer) || !has(self.configuration.apiServer.servingCerts) || !has(self.configuration.apiServer.servingCerts.namedCertificates) || !self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/hostedcluster_types.go` at line 527, Hoist the configuration existence guards out of the self.services.exists() predicate: first check has(self.configuration), has(self.configuration.apiServer), has(self.configuration.apiServer.servingCerts) and has(self.configuration.apiServer.servingCerts.namedCertificates) once before calling self.services.exists(), then keep the inner predicate focused on per-service fields (e.g. s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname))). This preserves the current short-circuit safety for cert.names while reducing repeated evaluation of the four has(...) checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@api/hypershift/v1beta1/hostedcluster_types.go`:
- Line 527: Hoist the configuration existence guards out of the
self.services.exists() predicate: first check has(self.configuration),
has(self.configuration.apiServer),
has(self.configuration.apiServer.servingCerts) and
has(self.configuration.apiServer.servingCerts.namedCertificates) once before
calling self.services.exists(), then keep the inner predicate focused on
per-service fields (e.g. s.service == 'APIServer' &&
has(s.servicePublishingStrategy.loadBalancer) &&
s.servicePublishingStrategy.loadBalancer.hostname != "" &&
self.configuration.apiServer.servingCerts.namedCertificates.exists(cert,
has(cert.names) && cert.names.exists(n, n ==
s.servicePublishingStrategy.loadBalancer.hostname))). This preserves the current
short-circuit safety for cert.names while reducing repeated evaluation of the
four has(...) checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 09370bd4-c0e3-477e-8ee3-6bf13e80a1bf
⛔ Files ignored due to path filters (18)
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AutoNodeKarpenter.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**cmd/install/assets/crds/hypershift-operator/tests/hostedclusters.hypershift.openshift.io/stable.hostedclusters.services.testsuite.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (2)
api/hypershift/v1beta1/hostedcluster_types.gotest/e2e/nodepool_test.go
|
/cc @enxebre /cc @jparrill /cc @bryan-cox |
|
@rutvik23: GitHub didn't allow me to request PR reviews from the following users: /cc. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8331 +/- ##
=======================================
Coverage 37.44% 37.44%
=======================================
Files 751 751
Lines 91969 91969
=======================================
Hits 34435 34435
Misses 54894 54894
Partials 2640 2640
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
jparrill
left a comment
There was a problem hiding this comment.
Thanks for the PR!. Dropped a comment
| - anything | ||
| - kas.duplicated.hostname.com | ||
| expectedError: "loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates" | ||
| - name: When loadBalancer hostname is set and apiServer has no servingCerts it should pass |
There was a problem hiding this comment.
For now in the envtest, is only covered the first has(). Do you mind add two more test cases to independently exercise each has() guard:
- "When loadBalancer hostname is set and apiServer has servingCerts but no namedCertificates it should pass" -- set
servingCerts: {}without namedCertificates. This exercises thehas(self.configuration.apiServer.servingCerts.namedCertificates)guard. - "When loadBalancer hostname is set and namedCertificates entry has no names it should pass" -- add a namedCertificates entry with only servingCertificate and no names field. This exercises the
has(cert.names)guard.
|
Thanks! I think this requires several fixes:
Can you please report/PR against o/api to make sure ServingCerts APIServerServingCerts Also it seems NamedCertificates should be required here: Then in this repo, is there a real use case to allow loadBalancer.hostname with out configuration.apiServer.servingCerts.namedCertificates[0].Names? if there isn't, then this can be required by the cel rule cc @JoelSpeed |
This API is old, changing the serialization now is not an option. Adding omitzero to newer clients will cause conflicts between old and new clients as some would attempt to drop the empty |
It seems the users are allowed to set |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre, rutvik23 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Scheduling tests matching the |
|
New changes are detected. LGTM label has been removed. |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
@rutvik23: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| // +kubebuilder:validation:XValidation:rule=`self.platform.type == "Azure" ? self.services.exists(s, s.service == "Ignition" && s.servicePublishingStrategy.type == "Route") : true`,message="Azure platform requires Ignition to use Route service publishing strategy" | ||
| // +kubebuilder:validation:XValidation:rule=`has(self.issuerURL) || !has(self.serviceAccountSigningKey)`,message="If serviceAccountSigningKey is set, issuerURL must be set" | ||
| // +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && has(self.configuration) && has(self.configuration.apiServer) && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]" | ||
| // +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && has(self.configuration) && has(self.configuration.apiServer) && has(self.configuration.apiServer.servingCerts) && has(self.configuration.apiServer.servingCerts.namedCertificates) && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]" |
There was a problem hiding this comment.
Have you considered using optional types here to remove all the has blocks
| // +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && has(self.configuration) && has(self.configuration.apiServer) && has(self.configuration.apiServer.servingCerts) && has(self.configuration.apiServer.servingCerts.namedCertificates) && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]" | |
| // +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.?servicePublishingStrategy.loadBalancer.hostname.orValue("") != "" && self.?configuration.apiServer.servingCerts.namedCertificates.orValue("[]").exists(cert, cert.?names.orValue("[]").exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]" |
There was a problem hiding this comment.
@JoelSpeed I tried using CEL optional types as you suggested; but then make update & CRD generation fails consistently on this cost-estimation error: OpenAPIv3 schema, spec.validation.openAPIV3Schema: Forbidden: x-kubernetes-validations estimated rule cost total for entire OpenAPIv3 schema exceeds budget by factor of more than 100x (try simplifying the rule, or adding maxItems, maxProperties, and maxLength where arrays, maps, and strings are declared)]
- upon digging more in this context, it turned out that CEL optional types (.?/.orValue()) have significantly higher estimated cost in the Kubernetes CRD cost estimator compared to has() guards. Hence, the has() guards worked out properly in local testing & make update also didn't fail.
- The HostedCluster CRD is massive with many validation rules, so it seem to be having very little cost headroom. We've had similar issue in this repo earlier -> NO-JIRA: fix(api): replace CEL url() validators with regex to fix CRD cost budget
So I think we're good to go with has() guards for now.
There was a problem hiding this comment.
have significantly higher estimated cost
Do you have any sources/links to explain this? Since it's a presence check, it should be fixed (low) cost 🤔
There was a problem hiding this comment.
The root cause is likely in the cel-go cost estimator (cel-go/checker/cost.go); let me know if you agree with this hypothesis:
Path tracking lost with .?: Regular field selects track schema paths viacostSelect → addPath()(https://github.dev/google/cel-go/blob/master/checker/cost.go#L507-L528), which allows the estimator to look up maxItems from the CRD schema. However,.? (optional select) is dispatched as a CallKind, and costCall(https://github.dev/google/cel-go/blob/master/checker/cost.go#L531) has no path-building for the select_optional_field overload; so the schema path is lost after ?.orValue() returns no ResultSize: It falls through to the generic O(1) case (https://github.dev/google/cel-go/blob/master/checker/cost.go#L866) with no result size metadata, sosizeOrUnknown returns {Min: 0, Max: math.MaxUint64}.- Unbounded size multiplied through nested exists(): The comprehension cost (https://github.dev/google/cel-go/blob/master/checker/cost.go#L688) computes
rangeCnt.MultiplyByCost(stepCost). With bothnamedCertificates.orValue([]) and cert.?names.orValue([])havingunbounded rangesizes, the nested loops produceMaxUint64 * MaxUint64 → overflow.
Comparison
| Aspect | has() guards | .?/.orValue() |
|---|---|---|
| AST kind | SelectKind (test-only) | CallKind |
| Path tracking | Preserved | Lost — no path for select_optional_field |
| List size for exists() | Bounded by schema | Max: math.MaxUint64 |
| Nested loop cost | bounded * bounded | MaxUint64 * MaxUint64 |
There was a problem hiding this comment.
Thanks for checking, this feels like a bug in the estimator to me, lets continue with what you have for now
|
I now have all the evidence needed for a complete analysis. Here is the report: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryNo Prow CI test jobs have failed. The error state is exclusively from the tide merge controller, which reports the PR has a git merge conflict with the Root CauseThe tide "error" state is not a test failure — it is a merge-gate status indicating the PR cannot be merged in its current state due to two independent issues:
Both issues must be resolved before tide will attempt to merge the PR. The e2e jobs that appear as "pending" are not failing — they have never been triggered. Recommendations
Evidence
|
The CEL rule validating that APIServer loadBalancer hostname is not in servingCerts namedCertificates fails with "no such key" when servingCerts or namedCertificates are not set. This adds has() guards for servingCerts, namedCertificates, and cert.names fields to prevent the error. Fixes: OCPBUGS-77827 Signed-off-by: rutvik23 <rkshirsa@redhat.com>
Add two additional envtest cases to exercise each has() guard added in the servingCerts CEL validation fix: - servingCerts present but no namedCertificates: exercises has(self.configuration.apiServer.servingCerts.namedCertificates) - namedCertificates entry with no names field: exercises has(cert.names) Signed-off-by: rutvik23 <rkshirsa@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code)
What this PR does / why we need it:
Adds missing
has()guards to the CEL validation rule that checks whether an APIServerloadBalancer.hostnameconflicts withservingCerts.namedCertificates[].namesThe rule was introduced in OCPBUGS-56725: Lb hostname certs cel #6194 (OCPBUGS-56725) but assumed
servingCertsandnamedCertificateswould always be present. In practice:servingCertsis a non-pointer struct (serialized asservingCerts: {}),but
namedCertificates(omitempty slice) is absent →no such key: namedCertificatesservingCertsatall →
no such key: servingCertsloadBalancer.hostnameon the APIServer service withconfiguration.apiServer(e.g.additionalCORSAllowedOrigins) but withoutservingCerts.Design decision
&&chains. Addinghas()guards forservingCerts,namedCertificates,and
cert.namesbefore accessing them is the CEL pattern to handle optional fields safely. This is consistent with the existinghas()guards already in the same rule forloadBalancer,configuration, andapiServer.Which issue(s) this PR fixes:
Fixes OCPBUGS-77827
Special notes for your reviewer:
has()guards were added:has(self.configuration.apiServer.servingCerts),has(self.configuration.apiServer.servingCerts.namedCertificates), andhas(cert.names).Checklist:
Summary by CodeRabbit